Provably Private Data Anonymization: Or, k-Anonymity Meets Differential Privacy
نویسندگان
چکیده
Privacy-preserving microdata publishing currently lacks a solid theoretical foundation. Most existing techniques are developed to satisfy syntactic privacy notions such as k-anonymity, which fails to provide strong privacy guarantees. The recently proposed notion of differential privacy has been widely accepted as a sound privacy foundation for statistical query answering. However, no general practical microdata publishing techniques are known to satisfy differential privacy. In this paper, we start to bridge this gap. We first analyze k-anonymization methods and show how they fail to provide sufficient protection against re-identification, which it was designed to protect. We then prove that, k-anonymization methods, when done “safely”, and when preceded with a random sampling step, can satisfy (ǫ, δ)-differential privacy with reasonable parameters. This result is, to our knowledge, the first to link k-anonymity with differential privacy and illustrates that “hiding in a crowd of k” indeed offers privacy guarantees. This naturally leads to future research in designing “safe” and practical k-anonymization methods. We observe that our result gives an alternative approach to output perturbation for satisfying differential privacy: namely, adding a random sampling step in the beginning and pruning results that are too sensitive to changing a single tuple. This approach may be applicable to settings other than microdata publishing. We also show that adding a random-sampling step can greatly amplify the level of privacy guarantee provided by a differentially-private algorithm. This result makes it much easier to provide strong privacy guarantees when one wishes to publish a portion of the raw data. Finally, we show that current definitions of (ǫ, δ)-differential privacy require δ to be very small to provide sufficient privacy protection when publishing microdata, making the notion impractical in some scenarios. To address this problem, we introduce a notion called f -smooth (ǫ, δ)-differential privacy.
منابع مشابه
Improving the Utility of Differential Privacy via Univariate Microaggregation
Differential privacy is a privacy model for anonymization that offers more robust privacy guarantees than previous models, such as k-anonymity and its extensions. However, it is often disregarded that the utility of differentially private outputs is quite limited, either because of the amount of noise that needs to be added to obtain them or because utility is only preserved for a restricted ty...
متن کاملUtility-Preserving Differentially Private Data Releases Via Individual Ranking Microaggregation
Being able to release and exploit open data gathered in information systems is crucial for researchers, enterprises and the overall society. Yet, these data must be anonymized before release to protect the privacy of the subjects to whom the records relate. Differential privacy is a privacy model for anonymization that offers more robust privacy guarantees than previous models, such as k-anonym...
متن کاملPrivacy Consensus in Anonymization Systems via Game Theory
Privacy protection appears as a fundamental concern when personal data is collected, stored, and published. Several anonymization methods have been proposed to address privacy issues in private datasets. Every anonymization method has at least one parameter to adjust the level of privacy protection considering some utility for the collected data. Choosing a desirable level of privacy protection...
متن کاملFrom t-closeness to differential privacy and vice versa in data anonymization
k-Anonymity and ε-differential privacy are two mainstream privacy models, the former introduced to anonymize data sets and the latter to limit the knowledge gain that results from the inclusion of one individual in the data set. Whereas basic k-anonymity only protects against identity disclosure, t-closeness was presented as an extension of k-anonymity that also protects against attribute discl...
متن کاملBig Data Anonymization Method for Demand Response Services
A demand response services as smart grid application produces and requires large amount of information about electric power consumption. This data can be regarded as big data and is needed to be anonymized for preserving privacy and reducing the amount. Electric power consumption data must be used carefully because it contains private information. The proposed method can convert data to existen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1101.2604 شماره
صفحات -
تاریخ انتشار 2010